Cheaper by the Dozen: Batched Algorithms
نویسندگان
چکیده
While computing power and memory size have been steadily increasing as predicted by Moore’s Law, they are still dwarfed by the size of massive data sets resultant from a number of applications. Many problems arising from astrophysics, computational biology, telecommunications, and the Internet often have an amount of accompanying data in the terabyte range. The analysis of this data by classical algorithms is often prohibitively expensive. Thus new ideas are necessary to create algorithms to deal with these massive data sets. In this paper we develop the idea of batching, processing several queries at a time, for more efficient algorithms for several query problems. The advantages of our algorithms, over the classical approach of putting the massive dataset into a data structure, are threefold: improved asymptotic performance, significantly smaller data structures, and a number of I/O’s which is linear in the size of the massive dataset. We use two techniques, query data structures and sampling, in the design of our batched algorithms. In addition, we believe that batched algorithms have many practical implications. Consider a webpage that answers queries on a large data set. Instead of answering these queries one at a time, which can result in a substantial bottleneck, we wait for several queries to accumulate, and then apply a batched algorithm that can answer them significantly faster. To illustrate the idea of batched algorithms, we consider the dictionary problem. Suppose we begin with n unsorted items. If we have only one query, it does not make sense to place the n items in a data structure; the best we can do is the brute force method of comparing the query with all n items. Now suppose we have b queries. If b is large enough and we have enough space, it makes sense to build a data structure such as a binary tree or perfect hash table. However, if 1 < b << n, we can do better. We simply sort the list of the b
منابع مشابه
Cheaper by the Dozen: Using Sibling Discounts at Catholic Schools to Estimate the Price Elasticity of Private School Attendance
متن کامل
Design of a Hybrid Genetic Algorithm for Parallel Machines Scheduling to Minimize Job Tardiness and Machine Deteriorating Costs with Deteriorating Jobs in a Batched Delivery System
This paper studies the parallel machine scheduling problem subject to machine and job deterioration in a batched delivery system. By the machine deterioration effect, we mean that each machine deteriorates over time, at a different rate. Moreover, job processing times are increasing functions of their starting times and follow a simple linear deterioration. The objective functions are minimizin...
متن کاملA Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations
As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. We describe the development...
متن کاملBatched matrix computations on hardware accelerators based on GPUs
Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-efficient, high-performance codes for these small matrix problems that we call batched factorizations....
متن کاملBatched Lazy Decision Trees
We introduce a batched lazy algorithm for supervised classification using decision trees. It avoids unnecessary visits to irrelevant nodes when it is used to make predictions with either eagerly or lazily trained decision trees. A set of experiments demonstrate that the proposed algorithm can outperform both the conventional and lazy decision tree algorithms in terms of computation time as well...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001